Search CORE

113 research outputs found

Learning from the past with experiment databases

Author: C. Perlich
D. Brain
H. Blockeel
I.H. Witten
J. Vanschoren
J. Vanschoren
M. Someren Van
R. Holte
Y. Peng
Publication venue: University of Waikato, Department of Computer Science
Publication date: 01/01/2008
Field of study

Thousands of Machine Learning research papers contain experimental comparisons that usually have been conducted with a single focus of interest, and detailed results are usually lost after publication. Once past experiments are collected in experiment databases they allow for additional and possibly much broader investigation. In this paper, we show how to use such a repository to answer various interesting research questions about learning algorithms and to verify a number of recent studies. Alongside performing elaborate comparisons and rankings of algorithms, we also investigate the effects of algorithm parameters and data properties, and study the learning curves and bias-variance profiles of algorithms to gain deeper insights into their behavior

CiteSeerX

Crossref

Research Commons@Waikato

Automated data pre-processing via meta-learning

Author: A Guazzelli
A Kalousis
D Pyle
F Serban
J Vanschoren
J-U Kietz
M Hall
MA Munson
SF Crone
T Dasu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

The final publication is available at link.springer.comA data mining algorithm may perform differently on datasets with different characteristics, e.g., it might perform better on a dataset with continuous attributes rather than with categorical attributes, or the other way around. As a matter of fact, a dataset usually needs to be pre-processed. Taking into account all the possible pre-processing operators, there exists a staggeringly large number of alternatives and nonexperienced users become overwhelmed. We show that this problem can be addressed by an automated approach, leveraging ideas from metalearning. Specifically, we consider a wide range of data pre-processing techniques and a set of data mining algorithms. For each data mining algorithm and selected dataset, we are able to predict the transformations that improve the result of the algorithm on the respective dataset. Our approach will help non-expert users to more effectively identify the transformations appropriate to their applications, and hence to achieve improved results.Peer ReviewedPostprint (published version

Crossref

UPCommons. Portal del coneixement obert de la UPC

A Community-Based Platform for Machine Learning Experimentation

Author: C. Stoeckert
D. Brain
H. Blockeel
H. Blockeel
I.H. Witten
J. Vanschoren
J. Vanschoren
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

We demonstrate the practical uses of a community-based platform for the sharing and in-depth investigation of the thousands of machine learning experiments executed every day. It is aimed at researchers and practitioners of data mining techniques, and is publicly available at http://expdb.cs.kuleuven.be. The system offers standards and API’s for sharing experimental results, extensive querying capabilities of the gathered results and allows easy integration in existing data mining toolboxes. We believe such a system may speed up scientific discovery and enhance the scientific rigor of machine learning research.status: publishe

Lirias

Crossref

Master your Metrics with Calibration

Author: A Pozzolo Dal
G Santafe
G Widmer
J Vanschoren
JA Hanley
P Branco
T Fawcett
T Saito
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/04/2020
Field of study

Machine learning models deployed in real-world applications are often evaluated with precision-based metrics such as F1-score or AUC-PR (Area Under the Curve of Precision Recall). Heavily dependent on the class prior, such metrics make it difficult to interpret the variation of a model's performance over different subpopulations/subperiods in a dataset. In this paper, we propose a way to calibrate the metrics so that they can be made invariant to the prior. We conduct a large number of experiments on balanced and imbalanced data to assess the behavior of calibrated metrics and show that they improve interpretability and provide a better control over what is really measured. We describe specific real-world use-cases where calibration is beneficial such as, for instance, model monitoring in production, reporting, or fairness evaluation.Comment: Presented at IDA202

arXiv.org e-Print Archive

Crossref

Case Study on Bagging Stable Classifiers for Data Streams

Author: Holmes G.
Pfahringer B.
Rijn J.N. van
Vanschoren J.
Publication venue
Publication date: 01/01/2015
Field of study

Algorithms and the Foundations of Software technolog

Leiden University Scholary Publications

The online performance estimation framework: heterogeneous ensemble learning for data streams

Author: Holmes G.
Pfahringer B.
Rijn J.N. van
Vanschoren J.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Algorithms and the Foundations of Software technolog

Pure OAI Repository

Leiden University Scholary Publications

Decoding machine learning benchmarks

Author: DG Pereira
F Martínez-Plumed
F Pedregosa
FM Lord
J Vanschoren
MC Monard
N Veček
OO Adedoyin
P Domingos
S Samothrakis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 19/08/2020
Field of study

Despite the availability of benchmark machine learning (ML) repositories (e.g., UCI, OpenML), there is no standard evaluation strategy yet capable of pointing out which is the best set of datasets to serve as gold standard to test different ML algorithms. In recent studies, Item Response Theory (IRT) has emerged as a new approach to elucidate what should be a good ML benchmark. This work applied IRT to explore the well-known OpenML-CC18 benchmark to identify how suitable it is on the evaluation of classifiers. Several classifiers ranging from classical to ensembles ones were evaluated using IRT models, which could simultaneously estimate dataset difficulty and classifiers' ability. The Glicko-2 rating system was applied on the top of IRT to summarize the innate ability and aptitude of classifiers. It was observed that not all datasets from OpenML-CC18 are really useful to evaluate classifiers. Most datasets evaluated in this work (84%) contain easy instances in general (e.g., around 10% of difficult instances only). Also, 80% of the instances in half of this benchmark are very discriminating ones, which can be of great use for pairwise algorithm comparison, but not useful to push classifiers abilities. This paper presents this new evaluation methodology based on IRT as well as the tool decodIRT, developed to guide IRT estimation over ML benchmarks.Comment: Paper published at the BRACIS 2020 conference, 15 pages, 4 figure

arXiv.org e-Print Archive

Crossref